I am using BERT to do multiclass text classification. The number of output classes I have to predict from is: 116 and there is high degree of class imbalance that I see.
We have the following kind of records available for each of the classes:
{âClass Aâ: 975 number of records,
âClass Bâ: 776 number of records,
âClass Câ: 533 number of records,
âClass Dâ: 412 number of records,
âClass Eâ: 302 number of records,
âClass Fâ: 250 number of records,
âClass Gâ: 207 number of records,
âClass Hâ: 137 number of records,
âClass Iâ: 96 number of records,
âClass Jâ: 51 number of records,
âClass Kâ: 28 number of records,
âClass Lâ: 17 number of records,
âClass Mâ: 7 number of records,
âClass Nâ: 2 number of records}
So I have two questions here:
Question1: As we have around 116 output classes to predict from, does that affect the performance of BERT due to the high number of output classes?
Question2: My original data has the similar type of class distribution that I have illustrated above. So how does this affect the performance of BERT and if it affects how do we handle this to get proper output?
Looking forward to get answer from the talented community we have here.
@swagat1509 Were you able to solve this ? I have the same scenario with around 106 classes, and highly imbalanced dataset, like 23k records for some class, and 2 records for some other class. I tried different models like distilbert-base-uncased, bert-base, deberta, roberta, bigbird, with different hyperparameter combinations, and different loss functions like focal loss, weighted loss etc., but I am not able to break the accuracy mark of 84 %. Please reply, if possible. Also, if someone else can help me in this scenario, your help would be greatly appreciated